Skip to content
This repository has been archived by the owner on Jul 13, 2023. It is now read-only.

feat: Use modern metrics #948

Merged
merged 1 commit into from
Jul 10, 2017
Merged

feat: Use modern metrics #948

merged 1 commit into from
Jul 10, 2017

Conversation

jrconlin
Copy link
Member

@jrconlin jrconlin commented Jul 5, 2017

@codecov-io
Copy link

codecov-io commented Jul 5, 2017

Codecov Report

Merging #948 into master will not change coverage.
The diff coverage is 100%.

Impacted file tree graph

@@          Coverage Diff           @@
##           master   #948    +/-   ##
======================================
  Coverage     100%   100%            
======================================
  Files          57     57            
  Lines        9605   9761   +156     
======================================
+ Hits         9605   9761   +156
Impacted Files Coverage Δ
autopush/main.py 100% <ø> (ø) ⬆️
autopush/web/base.py 100% <ø> (ø) ⬆️
autopush/web/webpush.py 100% <100%> (ø) ⬆️
autopush/db.py 100% <100%> (ø) ⬆️
autopush/tests/test_websocket.py 100% <100%> (ø) ⬆️
autopush/web/simplepush.py 100% <100%> (ø) ⬆️
autopush/utils.py 100% <100%> (ø) ⬆️
autopush/tests/test_integration.py 100% <100%> (ø) ⬆️
autopush/router/apnsrouter.py 100% <100%> (ø) ⬆️
autopush/web/registration.py 100% <100%> (ø) ⬆️
... and 13 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a938c02...39db9a7. Read the comment docs.

@@ -837,9 +820,10 @@ def _verify_user_record(self):
self.log.debug(format="Dropping User", code=104,
uaid_hash=self.ps.uaid_hash,
uaid_record=dump_uaid(record))
tags = list('code:104')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

str's iterable: you definitely want ['code:104'] instead

@@ -283,12 +284,26 @@ def _router_fail_err(self, fail):
self.log.debug(format="Success", status_code=exc.status_code,
logged_status=exc.logged_status or 0,
client_info=self._client_info)
success = True
self.metrics.increment('notification.message.success',
tags=dict(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

datadog expects tags as a list of strs, not dicts

..looks like we already had this mistake on 4 drop_user metrics -- causing them to never register into datadog AFAICT (yet kibana received the associated "Dropping User" logs)

web/webpush.py
119-                      uaid_record=dump_uaid(result))
120:            metrics.increment("updates.drop_user", tags={"errno": 102})
--
129-                      uaid_record=dump_uaid(result))
130:            metrics.increment("updates.drop_user", tags={"errno": 103})

websocket.py
841-            self.metrics.increment("client.drop_user",
842:                                   tags={"errno": 104})
--
855-                self.metrics.increment("client.drop_user",
856:                                       tags={"errno": 105})

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I thought these were being normalized. Will correct the others as well.

tags=dict(
destination="Direct",
router=router_type,
vapid=(vapid is True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vapid's a dict so this is always False

tags=self.base_tags)
self.metrics.gauge('ua.message_data', len(msg.get('data', '')),
tags={"source": msg.pop('source', 'Direct')})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you want notif.source vs msg.pop('source'). this could use a test

@@ -57,8 +58,11 @@ def setUp(self):
headers={"ttl": "0"},
host='example.com:8080')

mock_db = Mock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a test_db() in support you can also use, but it's not important here/this is fine

yield self._wait_for(lambda: len(self.metrics.mock_calls) > 0)
eq_(self.metrics.timing.call_args[0][0], 'ua.connection.lifespan')
# Wait for final cleanup (no error or metric produced)
yield sleep(1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these sleeps really necessary?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly, yes. Otherwise we get coverage issues.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you keep the older _wait_for on more metrics emitted + the client.notify_uaid_failure check that should do it instead (and a little quicker than a full sleep(1) -- coverage is ultimately waiting on that client.notify_uaid_failure

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrm, this might be another bit that got messed up in the patch I had to do to catch up. And looks like client.notify_uaid_failure isn't listed in the table. The metric had previously been removed.

@@ -61,9 +61,15 @@ def amend_endpoint_response(self, response, router_data):
"""Stubbed out for this router"""

def stored_response(self, notification):
self.metrics.gauge("notification.message_data",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these would be simplepush only as webpush's router overrides these methods.

Would this stat be interesting on external bridges too? If so it might make sense for the WebHandlers to be emitting these

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, but I wanted to get these calls about as close to the point that the action occurs as possible. Everything goes through route_notification, and that's really the only time we can say with authority what the disposition is & have the data size.

I'll add the calls to the other methods as well. (Although bridge would pretty much only be "Direct")

... and just realized there's a bunch of metrics we missed for the audit.

@jrconlin jrconlin force-pushed the feat/943 branch 2 times, most recently from 8d10be0 to 46c3489 Compare July 6, 2017 05:16
tags=dict(
code=exc.status_code,
router=router_type,
vapid=(vapid is True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

forgot this one and one in webpush

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

handled in subsequent update

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still here

yield self._wait_for(lambda: len(self.metrics.mock_calls) > 0)
eq_(self.metrics.timing.call_args[0][0], 'ua.connection.lifespan')
# Wait for final cleanup (no error or metric produced)
yield sleep(1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you keep the older _wait_for on more metrics emitted + the client.notify_uaid_failure check that should do it instead (and a little quicker than a full sleep(1) -- coverage is ultimately waiting on that client.notify_uaid_failure

@@ -852,8 +836,9 @@ def _verify_user_record(self):
uaid_record=dump_uaid(record))
self.force_retry(self.db.router.drop_user,
self.ps.uaid)
self.metrics.increment("client.drop_user",
tags={"errno": 105})
tags = list('code:105')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

forgot ['code:105'] here

self._client.increment(self._prefix_name(name), count, host=self._host,
**kwargs)

def gauge(self, name, count, **kwargs):
if 'tags' in kwargs:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitnitpick: tags=None explicitly in the func prototype is a lil cleaner

@jrconlin
Copy link
Member Author

jrconlin commented Jul 6, 2017

Reduced the sleeps to .25s. Granted, it's guesswork as to the optimal time required, but I'll admit that 1s is overkill.

@@ -144,8 +144,10 @@ def _route(self, notification, router_data):
apns_client.send(router_token=router_token, payload=payload,
apns_id=apns_id)
except (ConnectionError, AttributeError) as ex:
self.metrics.increment("updates.client.bridge.apns.connection_err",
self._base_tags)
self._base_tags.extend(['application:{}'.format(rel_channel),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't want to append to _base_tags, there's only one router instance per lifetime of the app (we could change routers' _base_tags to be class variables like cors_methods/response_handlers are in the Handlers, to make this explicit)

tags=dict(
code=exc.status_code,
router=router_type,
vapid=(vapid is True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still here

Copy link
Member

@pjenvey pjenvey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

web.webpush is also still emitting updates.drop_user which should be killed

@@ -89,19 +89,32 @@ def __init__(self, api_key, app_key, hostname, flush_interval=10,
def _prefix_name(self, name):
return "%s.%s" % (self._namespace, name)

def _normalize_tags(self, tags):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kill all this now that we'll be consistent w/ always passing lists of strs

self.metrics.increment(
"updates.client.bridge.apns.%s.sent" %
router_data["rel_channel"],
self._base_tags
)
self.metrics.gauge("notification.message_data",
len(notification.data or ""),
tags={'destination': 'Direct'})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tags=['destination:Direct'] -- there's about 8 of these to fix in this diff

self.metrics.timing("notification.request_time",
duration=time_diff)
self.metrics.increment('notification.message.success',
tags=dict(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

convert to list too

tags=[
"code:{}".format(exc.status_code),
"router:{}".format(router_type),
"vapid:{}".format(vapid is True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's still 2 bogus "vapid is True" in this patch: here and in webpush _router_completed. actually, both of these methods don't even need the full vapid dict. they just need a bool flag

webpush.py can be repsonsible for "vapid = jwt is not None" and pass that along to them

dest = 'Stored'
self.metrics.timing("notification.request_time",
duration=time_diff)
self.metrics.increment('notification.message.success',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just occurred to me that web.simplepush's missing this one

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

@jrconlin
Copy link
Member Author

jrconlin commented Jul 7, 2017

Went ahead and created the make_tags func since you've asked that we drop all dicts from tags.

which means closing #950

@@ -74,6 +74,14 @@ def timing(self, name, duration, **kwargs):
self._metric.timing(name, duration)


def make_tags(base=[], **kwargs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

although this is a safe use of a list as a default kwarg, might just wanna use base=None instead (kinda surprised flake8 didn't complain, I think PyCharm will)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, neither did. But, yeah, I keep forgetting about not doing that.

pjenvey
pjenvey previously approved these changes Jul 7, 2017
Copy link
Member

@pjenvey pjenvey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could use make_tags in a couple more spots I suppose but this'll work

pjenvey
pjenvey previously approved these changes Jul 7, 2017
@@ -678,7 +667,6 @@ def get_uaid(self, uaid):
# We unfortunately have to catch this here, as track_provisioned
# will not see this, since JSONResponseError is a subclass and
# will capture it
self.metrics.increment("error.provisioned.get_uaid")
raise
except JSONResponseError: # pragma: nocover
# We trap JSONResponseError because Moto returns text instead of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should stop using moto in the future, then we can remove this entirely since it only really applies to testing.

return RouterResponse(202, "Notification Stored")

def delivered_response(self, notification):
self.metrics.gauge("notification.message_data",
len(notification.data or ""),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of wonder if maybe notification should have a property data_length that does this.

success = True
self.metrics.increment('notification.message.success',
tags=[
'destination:"Direct"',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Value has double quotes around it but none of the others do?

@bbangert
Copy link
Member

bbangert commented Jul 7, 2017

I think only the last comment really needs to be addressed before merge, as we should be consistent in value quoting for tags.

Convert all tags to lists via make_tags

Closes #943, #950
@bbangert bbangert merged commit 3e4a2a1 into master Jul 10, 2017
@bbangert bbangert deleted the feat/943 branch July 10, 2017 21:08
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants